>>> a = u"abc"
>>> b = u"abc\u012c"
>>> a.encode("ascii", "ignore").decode("ascii") == a
True
>>> b.encode("ascii", "ignore").decode("ascii") == b
False
>>>
Others may supply more general/elegant/... approaches.
vbr
Bad idea. Use a dict; don't try to pretend that an object is a dict.
This isn't Javascript. Incidentally, inheriting from "dict" works,
and is quite useful.
class item(dict) :
...
p = item()
p['abc'] = 1
That wasn't in early versions of Python, which led to a style of abusing
objects as if they were dictionaries.
Also note that 1) spaces in attribute names can be troublesome, and
2) duplicating the name of a function or built-in attribute will
override it, usually leading to unwanted results.
John Nagle
Another way :
# -*- coding: utf-8 -*-
import unicodedata
def test_ascii(struni):
strasc=unicodedata.normalize('NFD', struni).encode('ascii','replace')
if len(struni)==len(strasc):
return True
else:
return False
print test_ascii(u"abcde")
print test_ascii(u"abcdê")
@-salutations
--
Michel Claveau
-1
Try your code with u"abcd\xa1" ... it says it's ASCII.
Suggestions:
test_ascii = lambda s: len(s.decode('ascii', 'ignore')) == len(s)
or
test_ascii = lambda s: all(c < u'\x80' for c in s)
or
use try/except
Also:
if a == b:
return True
else:
return False
is a horribly bloated way of writing
return a == b
> Try your code with u"abcd\xa1" ... it says it's ASCII.
Ah? in my computer, it say "False"
@-salutations
--
MCi
Perhaps your computer has a problem. Mine does this with both Python
2.7 and Python 2.3 (which introduced the unicodedata.normalize
function):
>>> import unicodedata
>>> t1 = u"abcd\xa1"
>>> t2 = unicodedata.normalize('NFD', t1)
>>> t3 = t2.encode('ascii', 'replace')
>>> [t1, t2, t3]
[u'abcd\xa1', u'abcd\xa1', 'abcd?']
>>> map(len, _)
[5, 5, 5]
>>>